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Abstract 



We consider packing LP's with m rows where all constraint coefficients are normalized to be in the 
unit interval. The n columns arrive in random order and the goal is to set the corresponding decision 
variables irrevocably when they arrive so as to obtain a feasible solution maximizing the expected reward. 
Previous (1 — e) -competitive algorithms require the right-hand side of the LP to be 0(™ log -), a bound 
that worsens with the number of columns and rows. However, the dependence on the number of columns 
is not required in the single-row case and known lower bounds for the general case are also independent 
of n. 

Our goal is to understand whether the dependence on n is required in the multi-row case, making it 
fundamentally harder than the single-row version. We refute this by exhibiting an algorithm which is 
(1— e)-competitive as long as the right-hand sides are log — ). Our techniques refine previous PAC- 

leaming based approaches which interpret the online decisions as linear classifications of the columns 
based on sampled dual prices. The key ingredient of our improvement comes from a non-standard 
covering argument together with the realization that only when the columns of the LP belong to few 1-d 
subspaces we can obtain small such covers; bounding the size of the cover constructed also relies on the 
geometry of linear classifiers. General packing LP's are handled by perturbing the input columns, which 
can be seen as making the learning problem more robust. 

1 Introduction 

Traditional optimization models usually assume that the input is known a priori. However, in most appli- 
cations, the data is either revealed over time or only coarse information about the input is known, often 
modeled in terms of a probability distribution. Consequently, much effort has been directed towards under- 
standing the quality of solutions that can be obtained without full knowledge of the input, which led to the 
development of online and stochastic optimization Q[6]|. Emerging problems such as allocating advertise- 
ment slots to advertisers and yield management in the internet are of inherent online nature and have further 
accelerated this development fT). 

Linear programming is arguably the most important and thus well-studied optimization problem. There- 
fore, understanding the limitations of solving linear programs when complete data is not available is a 
fundamental theoretical problem with a slew of applications, including the ad allocation and yield manage- 
ment problems above. Indeed, a simple linear program with one uniform knapsack constraint, the Secretary 
Problem, was one of the first online problems to be considered and an optimal solution was already obtained 
by the early 60's |[T3l[T5l . Although the single knapsack case is currently well-understood under different 
models of how information is revealed H, much less is known about problems with multiple knapsacks and 
only recently algorithms with solution guarantees have been developed lfi4l Q] [lOl . 

The Model. We study online packing LP's in the random permutation model. Consider a fixed but unknown 
LP with n columns a , a 2 , . . . , a n G [0, l] m , whose associated variables are constrained to be in [0, 1], and 
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m packing constraints: 



n 



OPT = max ^t x t 
t=i 



ri 



a l xt < B 



(LP) 



t=l 



x t G [0, 1] . 



Columns are presented in uniformly random order, and when a column is presented we are required to 
irrevocably choose the value of its corresponding variable. We assume that the number of columns n is 
knownQ The goal is to obtain a feasible solution while maximizing its value. We use OPT to denote the 
optimum value of the (offline) LP. 

By scaling down rows as necessary, we assume without loss of generality that all entries of B are the 
same, which we also denote with some overload of notation by B. Due to the packing nature of the problem, 
we also assume without loss of generality that all the 7r t 's are non-negative and all the a*'s are non-zero: we 
can simply ignore columns which do not satisfy the first property and always set to 1 the variables associated 
to the remaining columns which do not satisfy the second property. Finally, we assume that the columns 
a*'s are in general position: for all p G M m , there are at most m different t G [n] such that m = pa 1 . Notice 
that perturbing the input randomly by a tiny amount achieves this property with probability one, while the 
effect of the perturbation is absorbed in our approximation guarantees lfTTl [Tl. 

Related work. The random permutation model has grown in popularity lfl6l ITT1 01 since it avoids strong 
lower bounds of the pessimistic adversarial-order model (H while still capturing the lack of total information 
a priori. Different online problems have already been studied in this model, including bin-packing |[T9l . 
matchings |[T8l[T6l , the AdWords Problem ifTTTl and different generalizations of the Secretary Problem (HI2J 
|5l|24l[T71- Closest to our work are packing problems with a single knapsack constraint. In ll20l . Kleinberg 
considered the B-Choice Secretary Problem, where the goal is to select at most B items coming online in 
random order to maximize profit. The author presented an algorithm with competitive ratio 1 — 0(1/ \/B) 
and showed that 1 — £1(1/ y/B) is best possible. Generalizing the £>-Choice Secretary Problem, Babaioff et 
al. considered the online knapsack problem and presented a (l/10e)-competitive algorithm. Notice that 
in both cases the competitive ratio does not depend on n. 

Despite all these works, the first result for more general online packing LP's here was only recently 
obtained by Feldman et al. lfl4l and Agrawal et al. 0]. The first paper presents an algorithm that ob- 
tains with high probability a solution of value at least (1 - e)OPT whenever B > £l( ml ° 3 gn ) and OPT > 
Q( Tm " mlog " ), where 7r max is the largest profit. In the second paper, the authors present an algorithm which 
obtains a solution of expected value at least (1 — e)OPT under the weaker assumptions B > £1 (™ log ™) 



or OPT > £1 I 7I " ma % m log j I . One other way of stating this result is that the algorithm obtains a solution 



rent lower bound on B to allow (1 — e) -competitive algorithms is B > og 2 m , also presented in (TJ. We 
remark that these algorithms actually work for more general allocation problems, where a set of columns 
representing various options arrive at each step and the solution may choose at most one of the options. 

1 Actually knowing n up to ( 1 ± e) factor is enough. This assumption is required to allow algorithms with non-trivial competitive 
ratio fTTl . 




with competitive ratio 1 — 0( 



in log(n) log B 
B 



); notice that the guarantee degrades as n increases. The cur- 
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Both of the above algorithms use a connection between solving the online LP and PAC-learning J51 a 
linear classification of its columns, which was initiated by Devanur and Hayes iTTTIl in the context of the 
AdWords problem. Here we further explore this connection and our improved bounds can be seen as a 
consequence of making the learning algorithm more robust by suitably changing the input LP. Robustness 
is a topic well-studied in learning theory [12,21], although existing results do not seem to apply directly to 
our problem. We remark that a component of robustness more closely related to the standard PAC-learning 
literature is used in ifTTTl . 

In recent work, Devanur et al. iPTOl consider the weaker i.i.d. model for the general allocation problem. 
While in the random permutation model one can think that columns are sampled without replacement, in the 
i.i.d. model they are sampled with replacement. Making use of the independence between samples, Devanur 

et al. substantially improve requirement on B to 0( losl ^^ ) while showing that the lower bound Vt (^ lo f™ ^J 
still holds in this model. We remark, however, that these models can present very different behaviors: as 
a simple example, consider an LP with n columns, m = 1 constraints and budget 5 = 1, where only one 
of the columns has tt\ = a 1 = 1 and all others have 7Tj = a 1 = 0; in the random permutation model the 
expected value of the optimal solution is 1, while in the i.i.d. model this value is 1 — (1 — l/n) n — > 1 — 1/e. 
The competitiveness of the algorithm of [ 10] under the permutation model is still unknown and was left as 
an open problem by the authors. 

Our results. Our focus is to understand how large B is required to be in order to allow (1 — e) -competitive 
algorithms. In particular, the requirements for B in the above algorithms degrade as the number of columns 
in the LP increases, while the the lower bound does not. With the trend of handling LP's with larger 
number of columns (e.g. columns correspond to the keywords in the ad allocation problem, which in turn 
correspond to visits of a search engine's webpage), this gap is very unsatisfactory from a practical point 
of view. Furthermore, given that guarantees for the single knapsack case do not depend on the number of 
columns, it is important to understand if the multi-knapsack case is fundamentally more difficult. In this 
work, we give a precise indication of why the latter problem was resistant to arguments used in the single 
knapsack case, and overcome this difficulty to exhibit an algorithm with dimension-independent guarantee. 

We show that a modification of the DPA algorithm from [1] that we call Robust DPA obtains a (1 — e)- 
competitive solution for online packing LP's with m constraints in the random permutation model whenever 
B > log™)- Another way of stating this result is that the algorithm has competitive ratio 1 — 

0(m\/\ogB I > \/~B). Contrasting to previous results, our guarantee does not depend on n and in the case 
m = 1 matches the bounds for the 5-Choice Secretary Problem up to lower order terms. We finally remark 
that we can replace the requirement B > log ™) by OPT > log ™) exactly as done in 

Section 5.1 of Q. 

High-level outline. As mentioned before, we use the connection between solving an online LP and PAC- 
learning a good linear classification of its columns; in order to obtain the improved guarantee, we focus 
on tightening the bounds for the generalization error of the learning problem. More precisely, solving the 
LP can be seen as classifying the columns into 0/1, which corresponds to setting their associated variable 
to 0/1. Consider a family X C {0, l} n of linear classifications of the columns. Our algorithms sample a 
set S of columns and learn a classification x s G X which is "good" for the columns S (i.e., obtains large 
proportional revenue while not filling up the proportionally scaled budget too much). The goal is to upper 
bound the probability that x s is not good for the whole LP; this is typically done via a union bound over the 
classifications in X ifTTl fTl. 

To obtain improved guarantees, we refine this bound using an argument akin to covering: we consider 
witnesses (Section 2.2 1, which are representatives of groups of 'similar' bad classifications that can be used 
to bound the probability that any classification in the group is learned; for that we need to use a non-standard 
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measure of similarity between classifications which is based on the budget of the LP. The problem is that, 
when the columns (tti, a*)'s do not lie in a two-dimensional subspace of W 71 , the set X may contain a 
large number of mutually dissimilar bad classifications; this is a roadblock for obtaining a small set of 
witnesses. In stark contrast, when these columns do lie in a two-dimensional subspace (e.g., m = 1), these 
classifications have a much nicer structure which indeed allows a small set of witnesses. This indicates that 
the latter learning problem is intrinsically more robust than the former, which seem to precisely capture the 
increased difficulty in obtained good bounds for the multi-row case. 

Motivated by this discussion we first consider LP's whose columns a*'s lie in few one-dimensional 
subspaces (Section [2]). For each of these subspaces, we are able to approximate the classifications induced 
in the columns lying in the subspace by considering a small subset of the induced classifications; patching 
together these partial classifications gives us a witness set for X. However, this strategy as stated does 
not make use of the fact that the subspaces are embedded in an m-dimensional space, and hence leads to 
large witness sets. By establishing a connection between the "useful" patching possibilities with faces of a 
hyperplane arrangement in W 71 (Lemma 2.12 1, we are able to make use of the dimension of the host space 
and exhibit witness sets of much smaller sizes, which leads to improved bounds. 

For a general packing LP, we perturb the columns a*'s to make them lie in few one-dimensional sub- 
spaces that form an 'e-net' of the space, while not altering the feasibility and optimality of the LP by more 
than a (l±e) factor (Section[5]). Finally, we tighten the bound by using the idea of periodically recomputing 
the classification, following |1] (Section[4j). 



2 OTP for almost 1-dim columns 

In this section we describe and analyze the algorithm OTP (One-Time Pricing) over LP's whose columns are 
contained in few 1 -dimensional subspaces of M m . The overall goal is to find an appropriate dual (perhaps 



infeasible) solution p for ( |LP[ ) and use it to classify the columns of the LP. More precisely, given p £ R m , 
we define x{p)t = 1 if tt$ > pa 1 and x{p)t = otherwise. Thus, x(p) is the result of classifying the 
columns (ir t , a*)'s with the homogeneous hyperplane in IR m+1 with normal (— l,p). The motivation behind 
this classification is that it selects the columns which have positive reduced cost with respect to the dual 
solution p, or alternatively, it solves to optimality the Lagrangian relaxation using p as multipliers. 

Sampling LP's. In order to obtain a good dual solution p we use the (random) LP consisting on the first s 



columns of (LPi with appropriately scaled right-hand side. 



s 

max ^71-^)2^) 
t=i 



((M)-LP) 



t=i 



e[0,l] t = i,...,s. 



min — 8B ^^Pi + a a 



(t) 



i=l 



t=i 



pa"""' + a CT(i) > 7r CT (i) t = l 
p>0 
a > 0. 



((*, <5)-Dual) 



Here a denotes the random permutation of the columns of the LP. We use OPT(s, <5) to denote the optimal 
value of (s, <5)-LP and OPT(s) to denote the optimal value of (s, 1)-LP. 

The static pricing algorithm OTP of 0]] can then be described as follows]^] 

1. Wait for the first en columns of the LP (indexed by <r(l), <r(2), . . . , cr(en)) and solve (en, 1 — e)-Dual. 



To simplify the exposition, we assume that en is an integer. 
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Let (p, a) be the obtained dual optimal solution. 

2. Use the classification given by p as above by setting x a u\ = x(p) a M for t = en + 1, en + 2, . . . for 
as long as the solution obtained remains valid. From this point on set all further variables to zero. 

Note that by definition this algorithm outputs a feasible solution with probability one. Our goal is then 
to analyze the quality of the solution produced, ultimately leading to the following theorem. 

Theorem 2.1 Fix e G (0, 1]. Suppose that there are K >m 1-dim subspaces ofM m containing the columns 
a*'j and that B > O (™ log Then algorithm OTP returns a feasible solution with expected value at 
least (1 — he)OPT. 

Let S = {c(l), . . . ,a(en)} be the (random) index set of the columns sampled by OTP. We use p 
to denote the optimal dual solution obtained by OTP; notice that p s is completely determined by S. To 
simplify the notation, we also use x s to denote x(p ). 

Notice that, for all the scenarios where x s is feasible, the solution returned by OTP is identical to x s with 



its components x^^ , . . . , x^,^ set to zero. Given this observation and the fact that EE t<€n ^(t) 3 ^) 



< 



eOPT, one can prove that the following lemma implies Theorem 2. 1 



Lemma 2.2 Fix e G (0, 1]. Suppose that there are K > m 1-dim subspaces ofW 71 containing the columns 
a l 's and that B > O log Then with probability at least (1 — e), x s is a feasible solution for ( LP I 
with value at least (1 — 3e)OPT. 

2.1 Connection to PAC learning 

We assume from now on that B > log f ). Let X = {x(p) : p G M™} C {0, l} n denote the set of all 
possible linear classifications of the LP columns which can be generated by OTP. With slight overload in 
the notation, we identify a vector x G {0, 1}™ with the subset of [n] corresponding to its support. 

Definition 2.3 (Bad solution) Given a scenario, we say that x s is bad if it does not satisfy the properties 
of Lemma 2.2 namely x s is either infeasible or has value less than (1 — 3e)OPT. We say that x s is good 
otherwise. 

As noted in previous work, since our decisions are made based on reduced costs it suffices to analyze the 
budget occupation (or complementary slackness) of the solution in order to understand its value. To make 
this precise, given x G {0, l} n let cii(x) = J2t£x a \ ^ e i ts occupation of the ith budget and let af(x) = 
- 2~2texns a \ b e i ts appropriately scaled occupation of ith budget in the sampled LP (recall \S\ = en). 

Lemma 2.4 Consider a scenario where x s satisfies: (i)for all i G [m], ai{x s ) < B and (ii) for all i G [m] 
with pf > 0, ai(x s ) > (1 — 3e)B. Then x s is good. 

Moreover, since we are making decisions based on the optimal reduced cost for the sampled LP, our 
solution satisfies the above properties for the sampled LP. 

Lemma 2.5 In every scenario, x s satisfies the following: (i) for all i G [m], af (x s ) < (1 - e)B and (ii) 
for every i G [m] withpf > 0, af(x s ) > (1 — 2e)B. 
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Given that aj(x) = E[of (x)] for all x, the idea is to use concentration inequalities to argue that the 
conditions in Lemma 



2.4 



hold with good probability. Although concentration of af (x) for fixed x can be 
achieved via Chernoff-type bounds, the quantity af(x s ) has undesired correlations; obtaining an effective 
bound is the main technical contribution of this paper. 

Definition 2.6 (Badly learnable) For a given scenario, we say that x G X can be badly learned for budget 

i if either (i) af (x) < (1 — e)B and cn{x) > B or(ii) af (x) > (1 — 2e)B and ai(x) < (1 - 3e)B. 

Essentially these are the classifications which look good for the sampled (en, 1 — e)-LP but are actually 



bad for (|LP[). Putting Lemmas 2.4 and 2.5 together and unraveling the definitions gives that 



Pr (x S is bad) < Pr \f x can be badly learned for budget i 

Notice that the right-hand side of this inequality does not depend on x s , it is only a function of how skewed 
af (x) is as compared to its expectation aj(x). 

Usually the right-hand side in the previous equation is upper bounded by taking a union bound over all its 
terms [1J. Unfortunately this is too wasteful: when x and x' are "similar" there is a large overlap between the 
scenarios where af(x) is skewed and those where af (x') is skewed. In order to obtain improved guarantees, 
we introduce in the next section a new way of bounding the right-hand side of the above expression. 

2.2 Similarity via witnesses 

First, we partition the classifications which can be badly learned for budget i into two sets, depending on 
why they are bad: for i G [m], let Xf = {x G X : ai(x) > B} and Xf = {x G X : ai(x) < (1 - 3e)B}. 
In order to simplify the notation, given a set x we define skewnij(e, x) to be the event that af (x) < (1 — e)B 
and skewpj(e, x) to be the event that af(x) > (1 — 2e)B. Notice that if x G Xf~ , then skewmj(e,x) is 
the event that af (x) is significantly smaller than its expectation (skewed in the minus direction), while for 
x G X~ skewpj(e, x) is the event that af (x) is significantly larger than its expectation (skewed in the plus 
direction). These definitions directly give the equivalence 

Pr \J x can be badly learned for budget i = Pr y skewing (e, x) V y skewpj(e, x) 

In order to introduce the concept of witnesses, consider two sets x,x', say, in Xf . Take a subset 
w C x n x'; the main observation is that, since a* > for all t, for all scenarios we have af(w) < af (x) 
and af(w) < af(x'). In particular, the event skewnij(e, x) V skewnij(e, x') is contained in skewm(e, w). 
The set w serves as a witness for scenarios which are skewed for either x or x'; if additionally ai(w) 
reasonably larger than (1 — e)B, we can then use concentration inequalities over skewnij(e, w) in order to 
bound probability of skewm(e, x) V skewm(e, x'). This ability of bounding multiple terms of the right-hand 
side of (|2.2|) simultaneously is what gives an improvement over the naive union bound. 



Definition 2.7 (Witness) We say that is a witness set for X^ if: (i) for all w G W^, a,i(w) > (1 — 

e/2)B and ( ii)for all x G X^ there is w G contained in x. Similarly, we say that W7 is a witness set 
for Xf if: ( i) for all w G W^, a,i{w) < ( 1 — 3e/2) B and ( ii) for all x G X[~ there is w G Wj~ containing 
x. 
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As indicated by the previous discussion, given witness sets and V\A for X^~ and X i , we directly 
get the bound 

Pr \J skewm(e, x) V W skewp(e, x) J < Pr \J skewm(e, w) V W skewp(e, u>) 

(2.1) 

Putting together the last three displayed equations and using Chernoff-type bounds, we can get an upper 
estimate on the probability that x s is bad in terms of the size of witnesses sets. 

Lemma 2.8 Suppose that, for all i £ [m], there are witness sets for X^ and X^ of size at most M. Then 
Pr(x s is bad ) < 8mMexp (-^f 

One natural choice of a witness set for, say, X^ is the collection of all of its minimal sets; unfortunately 
this may not give a witness set of small enough size. But notice that a witness set need not be a subset of Xf 
(or even X). Allowing elements outside X^ gives the flexibility of obtaining witnesses which are associated 
to multiple "similar" minimal elements of X^ , which is effective in reducing the size of witness sets. 

2.3 Small witness sets for almost 1-dim columns 

Given the previous lemma, our task is to find small witness sets. Unfortunately, when the (7r t , q^'s lie in a 



space of dimension at least 3, X^ and Xf may contain many (fi(rt)) disjoint sets (see Figure 5.1 1, which 
shows that in general we cannot find small witness sets directly. This sharply contrasts with the case where 
the (jrt, a*) ' s ue i* 1 a 2-dimensional subspace of M m+1 , where one can show that X is a union of 2 chains 
with respect to inclusion. In the special c ase w here the a 1 's lie in a 1-dimensional subspace of W 11 , we show 
that X is actually a single chain (Lemma 2.10 1 and therefore we can take Wf as the minimal set of X^~ and 
WT as the maximal set of Xf . 



Due to the above observations, we focus on LP's whose a*'s lie in few 1-dimensional subspaces. In this 
case, X^ and X~ are sufficiently well-behaved so that we can find small (independent of n) witness sets. 

Lemma 2.9 Suppose that there are K > m 1-dimensional subspaces ofW 71 which contain the a l 's. Then 
there are witness sets for X^~ and X^~ of size at most (0( ^ log ^r)) m - 

Assuming the hypothesis of the lemma, partition the index set [n] into C\, C2, . . . , Ck such that for all 
j G [K] the columns {a t }tec- belong to the same 1-dimensional subspace. Equivalently, for each j £ [K] 
there is a vector c J of £00 -norm 1 such that for all t £ Cj we have a 1 = Ha^ooC 5 ' . An important observation 
is that now we can order the columns (locally) by the ratio of profit over budget occupation: without loss of 
generality assume that for all j £ [K] and t, t' £ Cj with t < t', we have — > 7 



a* Hoof 
es in Cj ; 



Given a classification x, we use x\c^ to denote its projection onto the coordinates in Cj; so x\c 3 is the 
induced classification on columns with indices in Cj. Similarly, we define X\c- = { x \Cj '■ x G X} as 
the set of all classifications induced in the columns in Cj . The most important structure that we get from 
working with 1-d subspaces, which is implied by the local order of the columns, is the following. 

Lemma 2.10 For each j £ [K], the sets in X\c- are prefixes ofCj. 



3 Notice that this ratio is well-defined since by assumption a* ^ for all t G [n] 
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To simplify the notation fix % G [m] for the rest of this section, so we aim at providing witness sets for Xf 
and X~. The idea is to group the classifications according to their budget occupation caused by the different 
column classes C/s. To make this formal, start by covering the interval [0, B + m] with intervals {Ie}eeL, 
where /„ = [0, f§ ) and h = [f# (1 + f Y~\ f#(l + f Y) for I > and L = {0, ... , riog 1+e/4 ^]} 
(note that since B > m, we have B + m < 2B). Define Bfj as the set of partial classifications y G ^|Cj 
whose budget occupation aj(y) lies in the interval Ig. For v G L x define the family of classifications 
B« = {(y 1 ^ 2 , ... , y K ) : yi G Bg-}. The jB?'s then provide the desired grouping of the classifications. 
Note that the may include classifications not in X and may not include classifications in X which have 
occupation Oj(.) greater than B + m. 

Now consider a non-empty B\. Let be the inclusion- wise smallest element in B\. Notice that such 
unique smallest element exists: since X\c t is a chain, so is B^-, and hence is the product (over j) of the 
smallest elements in the sets {B^ J j}j. Similarly, let vf( denote the largest element in B". Intuitively, and 
wV will serve as witnesses for all the sets in B". 

Finally, define the witness sets by adding the w$ and w;J"s of appropriate size corresponding to mean- 
ingful BV's: set = {wj : v G L K \B\ D X ^ 0, > (1 - e/2)P} and Wr = {< : t? G 
L K ,BV nX^<D, ai(ujV) < (1 - 3e/2)£}. 

It is not too difficult to see that, say, Wj + is a witness set for X^\ If x G X^ belongs to some B\, then 

belongs to W- and is easily shown to be a witness for x. However, if x does not belong to any B\, by 
having too large a,i(x), the idea is to find x' C x which belongs to some and to ^Y, and then use as a 
witness for x. We note that considering B^s for side lengths at most B + m and only adding witnesses for 
BV 's which intersect X are crucially used for bounding the size of and . 

Lemma 2.11 The sets W- and are witness sets for X^ and Xf . 

Bounding the size of witness sets. Clearly the witness sets and Wj~ have size at most \L\ K . Although 
this size is independent of n, it is still unnecessarily large since it only uses locally (for each Cf) the fact that 
X consists of linear classifications; in particular, it does not use the dimension of the ambient space M m . 
Now we sketch the argument for an improved bound, and details are provided in the appendix. 

First notice that the partial classification x(p) | c is completely defined by the value ptf . Thus, if J C [K] 
is such that the directions {c J }j & j form a basis of W rn then knowing -pc? for all j € J completely determines 
the whole classification x(p). Similarly, if we know that x(p)|c^ G B^ for all j G J, then for each j J 
we should have fewer possible B^ 3, s where the partial classification x(p)\c- can belong to; this indicates 
that some of the sets {B^} veL K do not contain any element from X, which implies a reduced size for the 
witness sets. 

In order to capture this idea, we focus on the space of dual vectors p and define the sets Fj ={p£ M.™ : 
x(p)\ Cj G Bfj} and P° = {p G : x(p) G B?}. Notice that P v = HjPj' and that B\ is empty iff P v is. 
The main step is to show that each P- is a polyhedron with "few" facets, which uses the definition of x(p) 



and Lemma 2. 10 We then consider the arrangement of the hyperplanes which are facet-defining for the 
Pj 's and conclude that the P^'s are given by unions of the cells in this arrangement; classical bounds on the 
number of cells in a hyperplane arrangement in M m then allow us to upper bound the number of nonempty 
P v 's. This gives the following. 



Lemma 2.12 At most (0(— log — )) m of the B\ 's contain an element from X. 



This lemma implies that W 4 + and W { each has size at most (0( — log — )) m , which then proves Lemma 



2.9 Finally, applying Lemma 2.8 we conclude the proof of Lemma [2^2 
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3 Robust OTP 



In this section we consider ( |LP| ) with columns that may not belong to few 1 -dimensional subspaces. Given 
the results of the previous section we would like to perturb the columns of this LP so that it belongs to 
few 1-dim subspaces, and such that an approximate solution for this perturbed LP is also an approximate 
solution for the original one. More precisely, we obtain a set of vectors Q C W n and transform each column 
a 1 into a column a* which is a scaling of a vector in Q, and we let the rewards it t remain unchanged. The 
crucial observation is that the solutions of an LP are robust to slight changes in the the constraint matrix. 

Lemma 3.1 Consider real numbers n%, . . . ,ir n and vectors a 1 , . . . , a n and a , ■ ■ ■ , a n in such that 



|S* — a'||oo < ^rxll a '||oo- Ifxisan e- approximate solution for ( |LP[ ) with columns (irt, a') and right-hand 



side (1 — e)B, then x is a 2e-approximate solution for the LP (LP). 



Perturbing the columns. To simplify the notation, set 5 = ; for simplicity of exposition we assume that 
1/5 is integral. When constructing Q we want the rays spanned by the each of its vectors to be "uniform" 
over Using as normalization, let Q be a 5-net of the unit sphere, namely let Q be the vectors in 
{0, 5, 25, 35,..., l} m which have norm 1. Note that \Q\ = (0(f )) m . 

Given a vector a* G W 11 we let a* = lld'HooO*, where q l is the vector in Q closest (in ioo) to tt-% — ■ By 

\\Q> || oo 

definition of Q, for every vector v G M m with \\v ||oo = 1 there is a vector q G Q with \\v — q\\oo < <5- It 



then follows from positive homogeneity of norms that the a s satisfy the property required in Lemma 3.1 

|| a* — a'||oo < 5\\a tn 



loo- 



Algorithm Robust OTP. One way to think of the algorithm Robust OTP is that it works in two phases. First, 
it transforms the vectors a 1 into a* as described above. Then it returns the solution obtained by running the 
algorithm OTP over the LP with columns (ir t , a 1 ) and right-hand side (1 — e)B. Notice that this algorithm 
can indeed be implemented to run in an online fashion. 

Putting together the discussion in the previous paragraphs and the guarantee of OTP for almost 1-dim 



columns given by Theorem 2.1 with K = \Q\ = (0( — )) m , we obtain the following theorem. 



,2 



Theorem 3.2 Fix e G (0, 1] and suppose B > £1 log Then algorithm Robust OTP returns a 
solution to the online (|LP|) with expected value at least (1 — lOe) OPT. 



4 Robust DPA 

In this section we describe our final algorithm, which has an improved dependence on 1/e. Following (H, 
the idea is to update the dual vector used in the classification as new columns arrive: we use the first 2 t en 
columns to classify columns 2 % en + 1, . . . , 2 l+l en. This leads to improved generalization bounds, which 
in turn give the reduced dependence on 1/e. The algorithm Robust DPA (as the algorithm DPA) can be 
seen as a combination of solutions to multiple sampled LP's, obtained via a modification of OTP denoted 
by (s,o)-OTP. 

Algorithm (s,<5)-OTP. This algorithm aims at solving the program (2s, 1)-LP and can be described as 
follows: it finds an optimal dual solution (p, a) for (s, (1 — 5))-LP and sets x a u\ = x(p) a ^ for t = 

s + 1, s + 2, . . . , t' < 2s such that t' is the maximum one guaranteeing Ylt=s+l a<T ^ x cT(t) ^ f-^- 

The analysis of (s, 5)-OTP is similar to the one employed for OTP. The main difference is that this 
algorithm tries to approximate the value of the random LP (2s, 1)-LP. This requires a partition of the bad 
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classifications which is more refined than simply splitting into and X~, and witness sets need to be 
redefined appropriately. Nonetheless, using these ideas we can prove the following guarantee for (s, <5)- 
OTP. Again let S = {<r(l), cr(2), . . . , cr(s)} be the random index set of the first s columns of the LP, let 
T = {cr(s + 1), a(s + 2), . . . , <r(2s)} and U = S U T. We use nu to denote the vector {^t)teu- 

Lemma 4.1 Suppose that there are K > m 1-dim subspaces ofW 71 containing the columns af's. Fix an 
integer s and a real number 5 G (0,1/10) such that > 0(mln4-). Then algorithm (s, <5)-OTP 

returns a solution x satisfying af(x) < B for all i G [m] with probability 1 and with expected value 
E[ttux} > (1 - 35)E[OPT{2s)} - E[OPT(s)] - 5 2 OPT. 

Algorithm Robust DPA. In order to simplify the description of the algorithm, we assume in this section that 
log(l/e) is an integer. 

Again the algorithm Robust DPA can be thought as acting in two phases. In the first phase it converts the 
vectors a 1 into a', just as in the first phase of Robust OTP. In the second phase, for i = 0, . . . , log(l/e) — 1, it 
runs over (|LP| with columns (7^, a') and right-hand side (1 — e)i?to obtain the solution 

x l . The algorithm finally returns the solution x consisting of the "union" of x*'s: x = Y^i x% - 

Note that the second phase corresponds exactly to using the first e2 l n columns to classify the columns 
e2 l n+ 1 , . . . , e2* +1 n. This relative increase in the size of the training data for each learning problem allow us 
to reduce the dependence of B on e in each of the iterations, while the error from all the iterations telescope 
and are still bounded as before. Furthermore, notice that Robust DPA can be implemented to run online. 

The analysis of Robust DPA reduces to that of (s, <5)-OTP. That is, using the definition of the parameters 
of (s, <5)-OTP used in Robust DPA and Lemma |4~Tj it is routine to check that the algorithm produces a 
feasible solution which has expected value (1 — e)OPT. This is formally stated in the following theorem. 

Theorem 4.2 Fix e G (0, 1/100) and suppose that B > In f0- Tnen tne algorithm Robust DPA 

returns a solution to the online LP ( |LP[ ) with expected value at least (1 — 50e)OPT. 

5 Open problems 

A very interesting open question is whether the techniques introduced in this work can be used to obtain 
improved algorithms for generalized allocation problems |[T4l . The difficulty in this problem is that the 
classifications of the columns are not linear anymore; they essentially come from a conjunction of linear 
classifiers. Given this additional flexibility, having the columns in few 1 -dimensional subspaces does not 
seem to impose strong enough properties in the classifications. It would be interesting to find the appropriate 
geometric structure of the columns in this case. 

Of course a direct open question is to improve the lower or upper bound on the dependence on the 
right-hand side B to obtain (1 — e)-competitive algorithms. One possibility is to investigate how much the 
techniques presented here can be pushed and what are their limitations. Another possibility is to analyze the 
performance of the algorithm from iflOl under the random permutation model. 
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Figure 5.1: Case m = 2, columns (yrt, a 1 ) equal to (1, sin(f + 5t), cos(^ +St)) for sufficiently small 5 > 0, 
represented by black dots. Each segment {t, t + 1, . . . , i + j} can be linearly classified and hence belongs 
to X. Furthermore, all segments {j2B, . . . ,(j + l)2B} belong to Xf, which then contains disjoint 
sets. Similar analysis holds for X~ . 

A Bernstein inequality for sampling without replacement 

Lemma A.l (Theorem 2.14.19 in |25 1) Let Y = { Yi , . . . ,Y n } be a set of real numbers in the interval [0, 1] 
and let < e < 1. Let S be a random subset ofY of size s and let Y$ = Sies ^i- Setting fj, = - £^ Y{ and 
a 2 = i ~~ ^) 2 > we ^ ave that for every r > 



Pr(|y 5 - s/x| > r) < 2exp 



2sa 2 + r 



Notice that, since the Y^s belong to the interval [0, 1], we can upper bound the variance by the mean as 
follows: 

^£i«-mi<±(ei*i+£hW 

i \ i i / 

This gives the following corollary. 

Corollary A. 2 Consider the conditions of the previous lemma. Then for all r > 

Pr(\Y s - s/jl\ >t)< 2exp 



As/i + T 



B Proof of Lemmas 2.4 and 2.5 



Proof of Lemma 2.4 : Fix a scenario a for the duration of the proof. By assumption x s is feasible for ([LP}, 



so it suffices to show that it attains value at least (1 — 3e)OPT. For that, consider (LPi with a modified 
right-hand side: 



max^ Tr t xt 
t=i 



y^qffft < ai(x s ) Vi G [m] 
x € [0, l] n . 



(modLP) 



t=i 
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a\xt 



Oj(x 5 )). Notice that x i 



Consider the Lagrangian relaxation L(p, x) = J2t=i n t x t — Y^hLx Pi(Yst 

is an optimal solution for max xe j 0i i]n L(p s , x), which is at least the OPT(modLPi, the optimum value of LP 
( |modLP[ ). Since x s is clearly feasible for ( |modLP[ ), it follows that x s is an optimal solution for the latter. 

Now let x* be an optimal solution for (LP). Since (x s ) > (1 - 3e)B for all i, and since a > for 
all t, it follows that (1 — 3e)x* is feasible for (JmodLPj). By linearity of the objective function we get that 
OPT( |modLP| ) > (1 - 3e) YJt=i = i 1 ~ 3e)OPT and the result follows. ■ 

Proof of Lemma \2.5\ Fix a scenario a for the duration of the proof. Let x* be an optimal solution for 
(era, (1 — e))-LP in complementary slackness with p s . If p s a l > ir t , the corresponding constraint in the dual 
is loose and by complementary slackness we get x* t = 0. If p s a l < 7r t , then for dual feasibility we have 
a* t > and by complementary slackness we have x* t = 1. 

From the definition of x s we get that x s < x* and, since the a*'s are non-negative, the feasibility of 
x* implies that af (x s ) < (1 — e)B for all i G [m]. Moreover, from our assumption that the input is in 
general position we get that there are at most m values of t such that p s a l = ir t - Therefore, x and x* 
differ in at most m positions and from primal complementary slackness we get that whenever p s > 0, 



a?(s*)>a?(x*) 



m 



[1 — e)B — m > (1 — 2e)B, where the last inequality follows from the fact that 



B >-. This concludes the proof of the lemma. 



C Proof of Lemma 2.8 



The following simple inequalities will be helpful. 

Observation C.l For e,a,fi> 0, ±^ff > 1 - (a + /3)e and ^Eff < 1 - (a - /3)e. 

Combining equations ( |2.1| ), ( |2.2[ ) and ( |2.1| ) and union bounding over all terms in the disjunction, we 



have that 

Pr (x 5 is bad) < Pr (skewm(e, w)) + Pr (skewp(e, w)) . 

Thus, it suffices to show that for all w G Wj + (respectively w G VV~), the event skewm(e, io) (resp. 

skewp(e, w)) occurs with probability at most 2 exp ^— . 

Take w G W 4 + . By definition of this set, a,i(w) > (1 — so the event skewm(e, is contained 
in the event that afjw ) < (1 — e)ai(u;)/(l — |), which is contained in the event af(w) < (1 — |)aj(ti;). 

Using Corollary A.2 with r = e 2 aj(w) /2, we obtain that Pr(skewm(e, w)) < 2 exp ■ 

Similarly, take w G W~, such that a,(u;) < (1 — ^-)B. It is easy to check that the event skewp(e, w) is 



contained in (w) > (l+|)aj(tt;), so using Corollary A.2 with r = e B/2 we get that Pr(skewm(e, w)) < 
2 exp ■ This concludes the proof of the lemma. 



D Proof of Lemma 2.10 



Fix j G [K]. Consider a set x G X and let p be a dual vector such that x(p) = x. Let t' be the last index of 
Cj which belongs to x\c ', this implies that TT t > > pa 1 = pc J ||o* ||oo> or alternatively — ^ — > pcK By the 

ordering of the columns, for all t G C, smaller than t' we have tt^, — > — > pc J and hence t G x\c ■■ 

J \\ a Hoc \\a l ||oo 3 

By definition of t' it follows that x\c- = {t G Cj : t < t'}, a prefix of Cf, this concludes the proof. 
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E Proof of Lemma 



2.11 



We prove that W + i s a witness set for X^\ the proof that W7 is a witness set for is analogous. 

First, we claim that for all x E X^ , there is x' E X such that x' C x and aj(x') E [P, B + m]. To 
see this, let p be such that x = x(p). For A > 0, define p x = p + Ae*, where denotes the ith canonical 
vector. We have that ai(x(p )) > B (since x(p) E Xf) and ai(x(p°°)) = (since columns with a\ > 
will at have at some point p x a l > Tr t ). Due to the assumption that the input is in general position, whenever 
cii(x(p x )) is discontinuous (as a function of A > 0) the right and the left limits differ by at most m. It then 
follows that there is A > such that ai(x(p x )) E [B, B + m], and since x(p x ) C x for all A > the claim 
follows. 

So take a classification x E Xf and let x' be as above. The fact that cii(x') < B + m and the non- 
negativity of the a*'s imply that there is an £ E L K such that x' E Z?|. Since ur is the unique smallest set in 
B\, clearly x' C ur. To show that w E W+, it suffices to argue that a,i(w e ) > (1 — e/2)B. 

Since yf, x' E B|, for all j such that ^- > we have ai(w \cj) > a i( x '\Cj)/0- + f )■ Moreover, for j 
such that £ = we have Oj(a?(p)|c.) < Adding over all j E [K] gives 



Oi(a?(p)) - ^2 a i( x iP)\Cj) 

jU,=Q 



-1 + f 4 - V 2/ ' 



where the third inequality follows from Observation C.l Thus, w E W, 



Since this property holds for all x E X^~, we conclude that W.- is a witness set for . 



F Proof of Lemma 2.12 



Recall the definitions of P v (for v E L K ) and Pj (for j E [m], ^ E L). It suffices to prove that at most 
(0(~ l°g ^)) m of the families P^'s are non-empty. 

Since E if and only if for all j E [K] we have x{p)\c 3 E Bg, it follows that P v = f]j P- j . Let 
Tj denote the first index in Cj such that the prefix {t E Cj : t < Tj} occupies the budget i to an extent in 



In. Using Lemma 2. 10 and the fact that the a s are non- negative, we get that Bfj is the set of all prefixes of 



Cj which contain t- but do not contain r- . Moreover, notice that the set x(p)\c j contains t- if and only 
if 7r jl > pa T 3 . It then follows from these observations we can express the set Pf using linear inequalities: 

Pj = {p £ R™ : tt t i > pa T i ,tt e+i < pa T i }. Since P v = f| - P J , we have that P v is given by the 

j 

intersection of halfspaces defined by hyperplanes of the form tt e = pai and p^ = (k E [m]). 

So consider the arrangement given by all hyperplanes {n T i = pa i }j£\K\,eeL an< l {Pi = ®JiLi- Given 
a face F in this arrangement and a set P v , either F is contained in P v or these sets are disjoint. Since the 
faces of the arrangement cover M m , it follows that each non-empty P v contains at least one of these faces. 

Notice that the arrangement is defined by K\L\ + m < 0(^- ni log — ) hyperplanes, where the last 
inequality uses the fact that log(l + |) > elog(l + |) holds (by concavity) for e E [0, 1]. It is known that 
an arrangement with h > m hyperplanes in M m has at most (f^) m faces (see Section 6.1 of [22] and page 
82 of Il23l0 . Using the conclusion of the previous paragraph, we get that there are at most (0(^ log ^)) m 
non-empty P^'s and the result follows. 
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G Proof of Lemma 



3.1 



Let LP1 denote the LP with columns (jrt, &*) and right-hand side (1 — e)B and LP2 denote the LP with 
columns (714, a 1 ) and right-hand side B. 

Let x be an e-approximate solution for LP1. Notice that we can upper bound ||a* — a'||oo as a function 

of ||a*||t»: 

ll~tll II ill II t ~t [[ TYL .. £ ~ £ 1 1 

00 ^ 00 k^ ^ 00 ^ k^ ^ 00 j 

e 

where the first inequality follows from triangle inequality. That is, we have ||a* — a* ||oo < ~||a*||oo. 
Given this bound, it is easy to see that x is feasible for LP2: 

Y, a i x t ^ + H Q i " ^ Xt - - e ) B + l' a * " a *H°°^ < (1 — e)S + — ^ Ha'IU^ < B, 

t t t t 

where the last inequality uses the fact that J2t ||o*||oo#t — < "t-B, since x is a feasible solution and 

the a u s are non-negative. 

In order to show that x is a 2e-approximate solution for LP2, it suffices to show that the optimum of LP1 
is at least 1/(1 + e) times the optimum of the LP2, since then x will be within a factor of (1 — e)/(l + e) > 
(1 — 2e) the optimum of LP2. So let x* be an optimal solution for LP2. Using the same argument as before, 
it is easy to see that x* / (1 + e) is feasible for LP1; this concludes the proof of the lemma. 



H Proof of Lemma 14.11 

The proof uses the same ideas used in the analysis of OTP, although some definitions need to be changed 
slightly. 

Recall that S = {cr(l), a(2), a(s)}, T = {a(s + 1), a(s + 2), . . . , a(2s)} and U = S LIT. Again 
we use p s to denote the dual vector used by (s, 5)-OTP for its classification, and set x s = x(p s ). With 
slight abuse in the notation, we often see x s as a (possibly infeasible) solution for (2s, 1)-LP, which means 
that we truncate the vector x s to the first 2s coordinates . . . , x s v 

As before, we focus on proving the following lemma; the proof that this lemma implies Lemma 4.1 is 
presented at the end of this section. 

Lemma H.1 Suppose that there are K > m 1-dim subspaces ofW 71 containing the columns a l 's. Fix an 
integer s and a real number 5 £ (0, 1/10) such that > f2(mfn y). Then with probability at least 

(1 — 5 2 ), x s satisfies af(x s ) < B for all i G [m] and has value ttux s > (1 — 35)OPT(2s). 

In a given scenario, we now say that x s is bad if aj(s s ) > B for some i € [m] or if nux (1 — 
3<5)OPT(2s). In this scenario, now a classification x 6 X can be badly learned for budget i due to in- 
feasibility if af(x) < (1 — S)B and aj(x) > B; x can be badly learned for budget i due to value if 
a f( x ) — (1 — 25) B and af (x) < (1 — 35) B. Then x can be badly learned for budget i if it falls into any 



of the above cases. The following is the appropriate modification of Lemma 2.4 for our current setting, and 
can be proved exactly in the same way. 

Lemma H.2 Consider a scenario where x s satisfies the following: (i) for all i G [m], af(x s ) < B and (ii) 
for all i G [m] with pf > 0, aY(x s ) > (1 — 35)B. Then x s is good. 



Due to our definitions, this lemma implies that inequality (2.1 1 still hold. 
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Witness sets. In the analysis of OTP, each x G X could be badly learned for budget i due to either 
infeasibility or (exclusively) due to value, which motivated the definitions of Xf and Xf . Now the same x 
can be badly learned for budget i due to both conditions. Therefore, we introduce two different partition of 
X, which tells why a classification is unlikely to be badly learned due to the appropriate condition. That is, 
we define Xf = {x e X : ai (x) > (1 - 5)B + ^} and yf = {x G X : ai (x) < (1 - 8)B + s -f] as 
the partition associated to the infeasibility condition and Xf = {x G X : ai(x) < (1 — 25) B — ^j-} and 
yf = {x G X : Oj(x) > (1 — 25)i? — ^} as the partition associated to the value condition. For example, 
Xf is the set of classifications which are unlikely to be infeasible because of a small aj(.) value. Also, note 
that these classifications are all based on the total budget occupation rather than on the budget occupation in 
the first 2 s columns only. 

Given this more refined tagging of elements in X, we also need to redefine witness sets. We say that 
(Wf , Wf , Zf , Zf ) are witness sets for (Xf , Xf , yf , yf ) respectively if they satisfy the following: 

w G Wf =>■ ai(w) > (1 - 5)B + x G Xf 3w G Wf : w C x 

3(5 .£> 

11)62+^ ai(w) > (1 - 25)B — , x G yf 3w G Zf : w C x 

to G VIV => a iO) < (1 - 25 )- B - j,i^"^3!ceW, + :iC«, 
iu € 3f => Oi(«;) < (1 - 5)B + —j-,* ^>3w£Wf : x C w . 

Again to simplify the notation, given a set x we define skewmf(<5, x) to be the event that af(x) < 
(1 — 5)B, skewpf (5, x) to be the event that af(x) > (1 — S)B and similarly replacing the set S by the 
sets T and f7. The following expression, which is the analogous to ( |2.2| )-( |2T| ), establishes the connection 
between the events where classifications can be badly learned and witness sets: 

y {x can be badly learned for budget i} C y skewm s (<5, to) V y skewm c/ (3<5, it;) 




\/ skewp 5 (2<5,w) V \/ skewp T (0,w) 



(H.2) 



To see that this expression holds, take x G X. Suppose that x G Xf and let to G Wf be contained in 
x. Then the event {x can be badly learned for budget i due to infeasibility} is contained in skewm (5, w). 
Similarly, if x G yf let w G Zf contain x; then the event {x can be badly learned for budget i due to infeasibility} 
is contained in skewm T (0, w). The reasoning for the event {x can be badly learned for budget % due to value} 
is similar. 



The following is analogous to Lemma 2.8 



Lemma H.3 Suppose that, for all i G [m], there are witness sets for {Xf , X- , yf , y i ) of size at most M 
Then Pr(x 5 is bad ) < 8mMexp f-f^f 
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Good witness sets. We now construct witness sets of size at most (OiM- log ^j)) m , so Lemma H.l will 
follow directly from Lemma H.3 The development mirrors that of Section [23] Let C%, C%, . . . , Ck be a 
partition of the index set [n] such that for all j, the columns {a t }tec belong to the same 1-dimensional 
subspace. 

Cover the interval [0, B+m] with intervals {h}eeL, where J = [0, |f ) and h = [£§ f§ (1+ 

for £ > and L = {0, . . . , |~log 1+(5/ / 8 + 1}. Define Bfj as the set of classifications x G #|c,- 
whose occupation aj(x) lies in the interval fy. Finally, for £ G L K , define the family of boxes B\ = Y[j ^ir 
Given t G L, let w (j) be the smallest set in X\c- which has a<i(w (j)) G h and for I G define 
the set w as the union of the sets u/ J (j)'s (or equivalently, as the concatenation of the vectors w ■'(J)'s). 
Similarly, for ^ G L let be the largest set in X\c } which has ai(w e (j)) G li and for I G define the 
set w as the union of the sets vri (j)'s. 

Now we construct the witness sets as before. Set V\?f = {w : ai(w 
Z+ = {w e : ai (w e ) > (l-2S)B-^-,BjnX ^ 0},setWr = {w e : ai {w 
0} and finally set Zr = {w e : ai (w e ) < (1 - 5)B + ^g?n ^ ^ 0}. 



> BjnX^®}, set 

< (1-25)5- $£,Bfn*^ 



2.11 



Following the same steps as in the proof of Lemma 
witness sets for (X^~, X~ , yf, y i ~). Moreover, the proof of Lemma 



one ca n chec k that (W/ , Wj , ^ , iJ, ) are 



2.12 



can be used to show that, for a 



fixed i G [m], at most (e^ log : §-) m of the B^'s contain an element of X, which then imposes the same 



upper bound on the size of the witness sets. This concludes the proof of Lemma [HT 
Proof of Lemma 



4.1 



Let x be the solution returned by (s, <5)-OTP and let 6 denote the event that x s is 
good. For any scenario in £, we have x a t t \ = xf (t) for all t = s + 1, s + 2, . . . , 2s. Therefore, we get that 



E 



2s 



> E 



> E 



> E 



' 2s 

E 

i=l 
' 2s 



Ka(t) x a(t) I £ 



i=l 
2s 



EmA 1 £ 



t=l 



Pr(£) 



Pr(5) - E[OPT(s) | £] Pr(£) 



Pr(£) -E[OPT(s)]. 



(H.3) 



To lower bound the first term in the right hand side we use again the definition of £: 

2s 

EmA i 8 



E 



> (1 - 35)E[OPT(2s) | S) Pt(S) 



and 



E[OPT(2s)] = E[OPT(2s) | 8} Pr(£) + E[OPT(2s) | 8} Pr(£) < E[OPT(2s) | 8] Pr(£) + <5 2 OPT, 



where the last inequality uses Lemma 



H.l 



(1 - 3<5)E[OPT(2s)] - <5 2 OPT, and the result follows from equation ( [H3] ) 



Combining the previous two inequalities give that E Ylt=i n a(t) x a 



I Proof of Theorem Q 



Let LP1 denote the LP with columns fa, a*) and right-hand side B = (l — e)B and LP2 denote the LP with 
columns (irt, a*) and right-hand side B. We show that Robust DPA returns a (1 — 21.5e)-approximation for 



LP1, and the theorem will follow from Lemma 3.1 
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First we show that the returned solution x is feasible for LP1. By definition of the algorithm, a>j(x l ) < 



log(l/e)-l 
i=0 



2* < B. 



e2 l B for all i, j. By linearity, dj(x) = ^ aj(x l ) < ei? 

In order to verify the value of the returned solution, we first show that > Q,(m In y) in every call 
to (s,<5)-OTP made by Robust DPA. As in Section [3] the columns o*'s belong to at most K = <3(™) m 
1-dim subspaces. Since £> > In ™)> we have that for each i = 0, . . . , log(l/e 

and 5 = \J e/2 l satisfies the expression s " sB • n ' ••' /v 
Then applying Lemma 



4.1 



n — v ' 

wegetthatforalli = 0, . . . , log(l/e)-l, E[vrx l ] > (1-3 



1 setting s = e2 l n 
E[OPT(e2 m ra)]- 



E[OPT(e2*n)] — ^r^. By linearity of the objective value and of expectations 



= ^EfTiV] > -E[OPT(en) 



log(l/e)-2 

E 

i=0 



E[OPT(en2 i+1 )] + (1 - 3\/2e - e)OPT. 



Lemma 2.4 of 1 1] states that E[OPT(s)] < f OPT for all s > 0. Employing this observation, we get 

log(l/e)-2 



E[vrx] > OPT — eOPT 



3^ + 2 + 3^6 



E 

i=0 



2«/2+i 



Since the summation in the expression can be upper bounded by ^ 1 — < ^, we get that K[7rx\ > 
(1 — 21.5e)OPT. This concludes the proof of the theorem. 
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